Homework 1

Author

Liam Smith

Professional wrestling, while not everyone’s cup of tea, is big business. What started as a carnival act has turned into a global entertainment industry. Netflix recently started showing Monday Night Raw, a program from the biggest North American wrestling company, WWE – this deal is reportedly worth $5 billion. Like any large entity, WWE is not without competition, drama, and scandal.

General Tips

This is very much a step-by-step process. Don’t go crazy trying to get everything done with as few lines as possible. Read the documentation for the AlphaVantage api! Carefully explore the pages from cagematch. There isn’t a need to get too fancy with anything here – just go with simple function and all should be good. Don’t print comments, but use normal text for explanations.

Step 1

In the calls folder, you’ll find 4 text files – these are transcripts from quarterly earnings calls. Read those files in (glob.glob will be very helpful here), with appropriate column names for ticker, quarter, and year columns;this should be done within a single function. Perform any data cleaning that you find necessary.

import glob as glob
import pandas as pd
import re

# Get file paths to calls
calls = glob.glob("G:/My Drive/MSBA/Unstructured/unstructured-notes/hw1/calls/*")

# Read in the calls into one dataframe
calls_df = pd.DataFrame(columns = ['text', 'ticker', 'quarter', 'year'])
for call in calls:
  table = pd.read_table(call, header = None, names = ['text'])
  table['ticker'] = re.search(r'(?<=calls\\).*(?=_q)', call).group(0).upper()
  table['quarter'] = re.search(r'(?<=calls\\.{3}_).*(?=_)', call).group(0).upper()
  table['year'] = re.search(r'(?<=q.{1}_).{4}', call).group(0)
  calls_df = pd.concat([calls_df, table])

Step 2

Use the AlphaVantage api to get daily stock prices for WWE and related tickers for the last 5 years – pay attention to your data. You cannot use any AlphaVantage packages (i.e., you can only use requests to grab the data). Tell me about the general trend that you are seeing. I don’t care which viz package you use, but plotly is solid and plotnine is good for ggplot2 users.

import requests 
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "notebook"

# Request inputs
api_key = '8ETIBM8H1NQL1ZWG'
tickers = ('TKO', 'WWE')

# Make request and save data
daily_prices = {}
for ticker in tickers:
  # Get data
  url = f'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol={ticker}&outputsize=full&apikey={api_key}'
  response = requests.get(url)
  data = response.json()
  # Extract time series data
  time_series = data['Time Series (Daily)']
  # Create DataFrame
  df = pd.DataFrame.from_dict(time_series, orient='index')
  df = df.rename(columns={
      '1. open': 'open',
      '2. high': 'high',
      '3. low': 'low',
      '4. close': 'close',
      '5. volume': 'volume'
  })
  df.index = pd.to_datetime(df.index)
  df = df.sort_index()
  # Save DataFrame to a dictionary
  daily_prices[ticker] = df

# Convert dictionary to dataframe
wwe_price = pd.DataFrame.from_dict(daily_prices['WWE'])
wwe_price['ticker'] = 'WWE'
tko_price = pd.DataFrame.from_dict(daily_prices['TKO'])
tko_price['ticker'] = 'TKO'
stock_price = pd.concat([wwe_price, tko_price])
# Select only last five years
stock_price = stock_price[stock_price.index > '2020-01-01']
stock_price['close'] = stock_price['close'].astype(float)

# Plot
fig = px.line(stock_price, x=stock_price.index, y='close', color='ticker', title='WWE/TKO Stock Price Over Time').update_layout(
 annotations =[dict(
    x = '2023-09-12',
    y = 103.05,
    text = 'TKO Takes over WWE',
    showarrow = True,
    arrowhead = 1,
    ax = 10,
    ay = -70,
    arrowwidth = 2,
    arrowcolor = 'white',
    bgcolor = 'black',
    bordercolor = 'white',
  )],
  xaxis_title = 'Date',
  yaxis_title = 'Stock Price (USD)',
  paper_bgcolor="black",
  plot_bgcolor="black",
  font=dict(color="white")
)

fig.show()

Step 3

Just like every other nerdy hobby, professional wrestling draws dedicated fans. Wrestling fans often go to cagematch.net to leave reviews for matches, shows, and wrestlers. The following link contains the top 100 matches on cagematch: https://www.cagematch.net/?id=111&view=statistics

from bs4 import BeautifulSoup

link = 'https://www.cagematch.net/?id=111&view=statistics'
cagematch = requests.get(link)
cagematch_soup = BeautifulSoup(cagematch.content, 'html.parser')

cagematch_top100 = cagematch_soup.select(
  '.TRow1, .TRow2'
)

# Create empty dataframe to populate
matches = []
# Loop through each match and pull out information
for match in cagematch_top100:
  columns = match.find_all('td')
  rank = columns[0].text.strip()
  date = columns[1].text.strip()
  wrestlers = columns[3].text.strip().split(r' vs. | vs | & |, ')
  won = columns[4].text.strip()
  match_type = columns[5].text.strip()
  rating = columns[6].text.strip()
  votes = columns[7].text.strip()
  # Handle multiple wrestlers with different delimiters
  wrestler_list = re.split(r' vs. | vs | & |, ', wrestlers[0])
  # Bring in promotion as well
  try:
    promotion = match.find_all('img')[0]['alt']
  except IndexError:
    promotion = 'N/A'
  matches.append([rank, date, promotion, wrestler_list, won, match_type, rating, votes])

cagematch_df = pd.DataFrame(matches, columns=['Rank', 'Date', 'Promotion', 'Wrestlers', 'Won', 'Match Type', 'Rating', 'Votes'])
  • What is the correlation between WON ratings and cagematch ratings?
for i in range(len(cagematch_df)):
  won = cagematch_df.loc[i, 'Won']
  stars = won.count('*')
  fraction_match = re.search(r'(\d/\d)', won)
  # If there is a fraction at the end, convert it
  fraction_value = 0
  if fraction_match:
    fraction_str = fraction_match.group(1)
    fraction_value = eval(fraction_str)
  cagematch_df.loc[i, 'Won'] = float(stars + fraction_value)
cagematch_df['Rating'] = cagematch_df['Rating'].astype(float)
# Remove rows where 'Won' is 0
cagematch_df_corr = cagematch_df[cagematch_df['Won'] != 0]

# Calculate and print the correlation
print(cagematch_df_corr['Won'].corr(cagematch_df_corr['Rating']))
0.35132323865117493

** Which wrestler has the most matches in the top 100?

wrestler_counts = cagematch_df['Wrestlers'].explode().value_counts()
print(wrestler_counts.sort_values(ascending=False).head(1))
Wrestlers
Kenny Omega    15
Name: count, dtype: int64

*** Which promotion has the most matches in the top 100?

promotion_counts = cagematch_df['Promotion'].value_counts()
print(promotion_counts.sort_values(ascending=False).head(1))
Promotion
New Japan Pro Wrestling    34
Name: count, dtype: int64

**** What is each promotion’s average WON rating?

print(cagematch_df.groupby('Promotion')['Won'].mean())
Promotion
All Elite Wrestling                       5.5625
All Japan Pro Wrestling                 4.979167
All Japan Women's Pro-Wrestling           3.6875
DDT Pro Wrestling                            0.0
GAEA Japan                                   0.0
JTO                                         4.75
Japanese Women Pro-Wrestling Project         5.0
Lucha Underground                            0.0
New Japan Pro Wrestling                 5.382353
Pro Wrestling NOAH                      4.785714
Ring Of Honor                             4.3125
Total Nonstop Action Wrestling               5.0
World Championship Wrestling                 5.0
World Wonder Ring Stardom                   2.75
World Wrestling Entertainment           4.892857
Name: Won, dtype: object

***** Select any single match and get the comments and ratings for that match into a data frame.

# Ric Flair vs. Ricky Steamboat
individual_link = 'https://www.cagematch.net/?id=111&nr=808&page=99'
individual = requests.get(individual_link)
individual_soup = BeautifulSoup(individual.content, 'html.parser')
ric_vs_ricky = individual_soup.select(
  '.CommentContents'
)
# Pull out what we want
ric_vs_ricky = [ric_vs_ricky[i].getText() for i in range(len(ric_vs_ricky))]
ric_vs_ricky_df = pd.DataFrame(columns = ['rating', 'comment'])
for comment in ric_vs_ricky:
  rating = re.search(r'\[(.*?)\]', comment)
  if rating:
    rating = rating.group(0)
  else:
    rating = 'N/A'
  comment = re.sub(r'\[(.*?)\]', '', comment)
  ric_vs_ricky_df = pd.concat([ric_vs_ricky_df, pd.DataFrame({'rating': [rating], 'comment': [comment]})])
# Assign index
ric_vs_ricky_df.index = range(len(ric_vs_ricky_df))
ric_vs_ricky_df.head()
rating comment
0 [10.0] "Bell to bell, this is an amazing display of ...
1 [10.0] "This match is everything, my favorite of a l...
2 [10.0] "Proof that for a 5 stars match you don't nee...
3 [10.0] "Two out of two in the "nailed it" column for...
4 [10.0] "What a first match to comment and rate on he...

Step 4

You can’t have matches without wrestlers. The following link contains the top 100 wrestlers, according to cagematch: https://www.cagematch.net/?id=2&view=statistics

link = 'https://www.cagematch.net/?id=2&view=statistics'
top_wrestlers = requests.get(link)
top_wrestlers_soup = BeautifulSoup(top_wrestlers.content, 'html.parser')
top_wrestlers = top_wrestlers_soup.select(
  '.TRow1, .TRow2'
)
len(top_wrestlers)

# Empty dataframe for their ID
wrestler_ids = pd.DataFrame(columns = ['Name', 'ID'])
for wrestler in top_wrestlers:
  # Find and pull out ID
  id_cols = wrestler.find_all('td')
  wrestler_id = id_cols[1].find('a')['href']
  wrestler_id = re.search(r'(?<=nr=)\d+', wrestler_id).group(0)
  wrestler_id = int(wrestler_id)
  # Find and pull out name
  name_cols = wrestler.find_all('td')
  name = name_cols[1].text.strip()
  wrestler_ids = pd.concat([wrestler_ids, pd.DataFrame({'Name': [name], 'ID': [wrestler_id]})])

# Clean up dataframe
wrestler_ids.index = range(len(wrestler_ids))

# Go get match stats
wrestler_stats = pd.DataFrame(columns = ['Name', 'ID', 'Matches', 'Wins', 'Losses', 'Draws'])
for id in wrestler_ids['ID']:
  link = f'https://www.cagematch.net/?id=2&nr={id}&page=22'
  wrestler = requests.get(link)
  wrestler_soup = BeautifulSoup(wrestler.content, 'html.parser')
  wrestler = wrestler_soup.select(
    '.InformationBoxContents'
  )
  # Empty list to populate
  stats = []
  for stat in wrestler:
    match = wrestler[0].text.strip()
    win = wrestler[1].text.strip()
    loss = wrestler[2].text.strip()
    draw = wrestler[3].text.strip()
    stats.append([match, win, loss, draw])
  # Append to dataframe
  wrestler_stats = pd.concat([wrestler_stats, pd.DataFrame({'ID': [id], 'Name': wrestler_ids[wrestler_ids['ID'] == id]['Name'].values[0], 'Matches': [match], 'Wins': [win], 'Losses': [loss], 'Draws': [draw]})], ignore_index=True)
# Reset index
wrestler_stats.index = range(len(wrestler_stats))

# Clean up columns
wrestler_stats['Matches'] = wrestler_stats['Matches'].astype(int)
wrestler_stats['Win Count'] = wrestler_stats['Wins'].str.extract(r'(\d+)').astype(int)
wrestler_stats['Win Percentage (%)'] = wrestler_stats['Wins'].str.extract(r'\(([^)]+)\)')
wrestler_stats['Win Percentage (%)'] = wrestler_stats['Win Percentage (%)'].str.replace('%', '').astype(float)
wrestler_stats['Loss Count'] = wrestler_stats['Losses'].str.extract(r'(\d+)').astype(int)
wrestler_stats['Loss Percentage (%)'] = wrestler_stats['Losses'].str.extract(r'\(([^)]+)\)')
wrestler_stats['Loss Percentage (%)'] = wrestler_stats['Loss Percentage (%)'].str.replace('%', '').astype(float)
wrestler_stats['Draw Count'] = wrestler_stats['Draws'].str.extract(r'(\d+)').astype(int)
wrestler_stats['Draw Percentage (%)'] = wrestler_stats['Draws'].str.extract(r'\(([^)]+)\)')
wrestler_stats['Draw Percentage (%)'] = wrestler_stats['Draw Percentage (%)'].str.replace('%', '').astype(float)
wrestler_stats = wrestler_stats.drop(columns=['Wins', 'Losses', 'Draws'])

*** Of the top 100, who has wrestled the most matches?

print(wrestler_stats.sort_values(by='Matches', ascending=False).head(1))
         Name    ID  Matches  Win Count  Win Percentage (%)  Loss Count  \
95  Ric Flair  1091     4999       2553                51.1        1971   

    Loss Percentage (%)  Draw Count  Draw Percentage (%)  
95                 39.4         475                  9.5  

***** Of the top 100, which wrestler has the best win/loss?

print(wrestler_stats.sort_values(by='Win Percentage (%)', ascending=False).head(1))
             Name    ID  Matches  Win Count  Win Percentage (%)  Loss Count  \
25  Gene Okerlund  1383        4          4               100.0           0   

    Loss Percentage (%)  Draw Count  Draw Percentage (%)  
25                  0.0           0                  0.0  

Step 5

With all of this work out of the way, we can start getting down to strategy.

First, what talent should WWE pursue? Advise carefully.
TKO Holdings is a conglomerate that was created when the WWE merged with Zuffa, LLC., the parent of UFC. Moving forward, TKO should look worldwide to acquiring new talent in order to keep its customers satisfied. According to Cagematch.net, the top two wrestling promotions of all time are All Elite Wrestling and New Japan Pro Wrestling. In 2024 AEW ranked behind the WWE and UFC as the third largest combat sports organization in the world. TKO should look to acquire talent from AEW, or look for a partnership with them in order to continue growing their North American presence. New Japan Pro Wrestling offers an exciting opportunity to expand abroad. Looking at the Cagematch.net rankings shows that the top 9 wrestlers are from Japan. Similar to AEW, TKO should look to acquire talent from New Japan Pro Wrestling or form a partnership with them in order to expand their global presence.

Second, reconcile what you found in steps 3 and 4 with Netflix’s relationship with WWE. Use the data from the following page to help make your case: https://wrestlenomics.com/tv-ratings/
Starting in January of 2025, the WWE and Netflix entered into a parentership that gives Netflix exclusive streaming rights to WWE’s Monday Night Raw. Raw is one of WWE’s premier events, along with Smackdown, and this shift to Netflix marks a new era in the history of WWE. The last episode before the move to Netflix brought in 1.5M+ viewers, while the first Netflix stream brought in 4.9M+ viewers. By putting Raw on Netflix, international viewers are offered an easy way to replay the event, even if they can not watch it live. As shown in steps 3 and 4, some of the highest level of wrestling talent comes from Japan, with many of the top events coming from New Japan Pro Wrestling. If the WWE can put out a product with a similar level of enjoyment for the fans, they can expect their viewership to grow internationally. If growth continues to happen, the WWE can consider expanding their partnership with Netflix, bringing their other premier events to the streaming platform.

Third, do you have any further recommendations for WWE?
Moving forward, it is apparent to me that the WWE should have two objectives: expanding their viewership through the Netflix partnership and advancing into the international space. In some other sports there has been success around having events where countries are pitted against each other. I think that if TKO does not see an acquisition of New Japan Pro Wrestling as a feasible option, they should consider a massive crossover event. This event could be held in Japan and feature the top wrestlers from the WWE and New Japan Pro Wrestling. This event would be a great way to introduce the WWE to the Japanese market and could be a great way to expand their international presence. By having this event on Netflix, the WWE can hit on both of these objectives.